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Abstract. 

We show that the first 10 eigencomponents of the Karhunen-Loeve expansion or Principal Component Analysis 
(PCA) provide a robust classification scheme for the identification of stars, galaxies and quasi-stellar objects from 
multi-band photometry. To quantify the efficiency of the method, realistic simulations are performed which match 
the planned Large Zenith Telescope survey. This survey is expected to provide spectral energy distributions with 
a resolution R ~ 40 for ~ 10 6 galaxies to R < 23 (z ~ 1), ~ 10 4 QSOs, and ~ 10 5 stars. 

We calculate that for a median signal-to-noise ratio of 6, 98% of stars, 100% of galaxies and 93% of QSOs are 
correctly classified. These values increase to 100% of stars, 100% of galaxies and 100% of QSOs at a median 
signal-to-noise ratio of 10. The 10-component PCA also allows measurement of redshifts with an accuracy of 
CTRes. Ss 0.05 for galaxies with z ^ 0.7, and to <jr cs . Ss 0.2 for QSOs with z <; 2, at a median signal-to-noise 
ratio of 6. At a median signal-to-noise ratio 20, ctr os . 0.02 for galaxies with z & 1 and for QSOs with z <; 2.5 
(note that for a median S/N ratio of 20, the bluest/reddest objects will have a signal-to-noise ratio of 2 in 
their reddest/bluest filters). This redshift accuracy is inherent to the R ~ 40 resolution provided by the set of 
medium-band filters used by the Large Zenith Telescope survey. It provides an accuracy improvement of nearly 
an order of magnitude over the photometric redshifts obtained from broad-band BVRI photometry. 



Key words, galaxies: distances and redshifts - galaxies: fundamental parameters 
general - stars: fundamental parameters - techniques: photometric 



methods: statistical - quasars: 



1. Introduction 

The galaxy luminosity function (hereafter LF), defined as 
the number density of galaxies per unit interval of lumi- 
nosity is a fundamental statistical tool required to model 
the formation, evolution and clustering of the galaxies. At 
z ^ 1, it is well established that the LF depends on the 



galaxy mor 


phological type (Binggeli et al. 1988; Marzke 


et al. 1998; 


Loveday et al. 1999 


), and that it evolves with 


redshift (Lilly et al. 1996; Lin et al. 1999; Bromley et al. 


199? ; de Lapparent et al. 2002a 


). Measurement of the LF 



thus requires large galaxy samples which can be separated 
into several morphological, spectral or color classes and in 
redshi ft intervale. Among the next generation rodohift our 



galaxy LFs, even if the 2 surveys shall suffer from various 
aperture effects and calibration difficulties. The third sur- 
vey is the Large Zenith Telescope (hereafter LZT) survey, 
which is optimized for the measurements of the galaxy 
LFs to z ~ 1. 

An essential step in the measurement of the LF for a 
systematic survey is the classification of objects as stars, 
galaxies, QSOs, etc. For the galaxies and QSOs, an esti- 
mate of the redshift for each object is also desired. This 
paper proposes and tests a classification approach based 
on the Principal Component Analysis (PCA), also known 
as Karhunen-Loeve expansion - the underlying princi- 
ples of the PCA were independently derived by Karhunen 



veys, ol nly 3 will bo able to probe tho galaxy LF to z ° i 1. 



The DEEP redshift survey using Keck telescopes (Davis & 
Faber 1998|) and the V1RM OS redshift survey using the 
VLT (Le Fevre et al. 1998) are optimized for clustering 
analys es, but they will also provido moaeuromonte of tho 



(1947 ) and Loeve (1948 ). The PCA is a non-parametric 
approach which has been successfully used for a variety 
of astronomical applications including stellar classifica- 



tion from photometric data (Deeming 1964; Bcarfe 1966; 
Whitney 1983a. bf) and from s pectra ( storric-Lombardi 



et al. 1994; Ibata & Irwin 1997; Bailer-Jones et al. 1998 
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[Singh et al. 1998 ), galaxy classification from photomct 



ric data (Watanabe et al. 1985) and from galaxy spec- 
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Table 1. Characteristics of the LZT 



Longitude 122°34'22.4" 

Latitude 49° 17' 15.5" 

Altitude 403 m 

Median seeing 0.9" 

Telescope diameter 6 m 
Focal length 
Diameter of corrected field 



10 m 

24' 

Detector Thinned 2048 x 2048 CCD 
Image scale 0.495" / pixel 

17' x 280° ~ 80 deg 2 



Surveyed area 
Integration time 64.76 sec 
limiting R magnitude 25.4 



tra (Connolly ct al. 1995a b|; Galaz fe de Lapparent 1998 ; 



Connolly fc Szalay 1999 ; Ronen ct al. 1999 ) , and fo r galaxy 
redshift measurements ( |Glazcbrook et al. 1998 ); other 



stellar medium emission lines (Heyer fc Schloerb 1997) 



fields of ap plication are solar fl are observations ( Teubei 
et al. L979 ), asteroid spectra ( Britt et al. 1992), inter 



gamma ray bursts ( Bagoly et al. 1998|), and active galax- 



ies (Mittaz et al. 1990; Dultzin-Hacyan & Ruano 1996; 



Turler & Courvoisier 1998) 



All previous spectral classification attempts using the 
PCA employed either multicolor photometry (usually 
fewer than 10 color bins, e.g. UBVRIJHK) or medium 
to high-resolution spectroscopy (resolution R > 500). The 
PCA has not been tested on spectral energy distributions 
(SED) with R ~ 40 because no such data existed un- 



til the UBC-NASA Multi- narrowband survey (Hickson & 
Mulroc fney 199Sj ; |Cabanac et al. 1998| ). 

In this paper we use simulations based on the LZT 
survey parameters to evaluate the PCA method. Section 

2 describes our simulations of mock LZT catalogues, Sect. 

3 describes the approach used to classify the objects using 
the PCA, Sect. || shows the efficiency of the classification 
and Sect. [| discusses the redshift accuracies which can 
be obtained directly from the PCA or from a compos- 



ite method similar to that described by Glazebrook et al 
(1998Q 

2. Mock catalogues 

In order to be able to simulate the classification efficiency 
of the PCA, we create mock catalogues which match as 
closely as possible the observations of the LZT. A calibra- 
tion of the PCA on real data is still necessary to extract 
sensible physical information. The problem is addre ssed in 
another paper in preparation ( Cabanac et al. 2002 ) using 
data from the NASA Orbita l Debris Observatory (NODO; 
Hickson fc Muhooncy 1998|) . 



2.1. The Large Zenith Telescope 



The Large Zenith Telescope (Hickson ct al. 1998) is 




\,nter( n )= 4045e °'° 229(n_1> FWHM(n) = 1560e°' 




Fig. 1. The transmission curves of the medium-band fil- 
ters + U band used by the LZT (top graph) and the central 
wavelengths A ce ntcr (crosses) and FWHM of the filters 
multiplied by 10 (bottom graph). 



Vancouver, Canada. With first-light expected in 2002, 
it will conduct drift-scan surveys of a strip of sky cen- 
tered at 49° 17' declination. The main characteristics of 
the LZT are given in Table [| It is a zenith-pointing tele- 
scope equipped with a 2k x 2k thinned CCD (0.5"/pix) 
at the prime focus, which scans a 17' x 110° strip of sky 
at sidereal rate. Forty medium-band filters are employed 
which have logarithmically-separated central wavelengths 
from 4000 A to 1 fi and one broad-band U filter. The 
survey is expected to yield calibrated spectral energy dis- 
tributions (SEDs) for ~ 10 6 galaxies to R < 23 (z ~ 1), 
- 10 4 QSOs and - 10 5 stars. 

2.2. Stars 



A phenomenological model producing the ob- 
served relative amounts of blue and red stars 
is sufficient for the mock catalog. We use the 
galaxy model of Bahcall fc Soncira (Bahcall 1986 ) 



(http://www.sns.ias.edu/~jnb/Html/galaxy.htm] html) 
to derive star counts and colors. Table |2| lists the star 
counts and colors predicted for a limiting magnitude 
of R < 23. The template spectra are collected from 

available at CDS 



a 6-m liquid-mirror telescope under construction near 



the star catalog of Pickles (1998 
(http://cdsweb.u-strasbg.fr/). This catalog contains 131 
stellar spectra in the range 1150-10620 A, spanning all 
star temperatures (O to M), types (I to V), and metal- 
licities (normal, rich, or poor). The template spectra are 
filtered using the transmission curves of Fig. |l]. Fi gure \4 
shows examples of template stellar spectra. 

The simulation proceeds as follows. For each color bin 
in Tabic |^, we randomly draw the required number of 
templates (number of stars given in Table |^) from Pickles 
sub-sample of giant (or dwarf) templates. For instance, 
bin 0.5 < B — V < 0.6 has 22 stars in the disk compo- 
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Table 2. Star counts and B — V colors predicted by the 
Bahcall-Soneira model of the Galaxy (Bahcall 1986) for a 
limiting magnitude R < 23. The color noise is set to 0.1 
mag. The spheroid giant branch is that of M13. 





Total 


Disk 


Spheroid 


Star counts 


3371 


1445 


1926 


Mean (B - V) 


1.07 


1.46 


0.76 


Star fraction 




43% 


57% 


Giant fraction 


12% 


0.4% 


21% 


(B - V) bin 


0.1-0.2 


3.3 


0.2 


3.1 


0.2-0.3 


31.6 


0.7 


30.8 


0.3-0.4 


135.7 


2.6 


133.0 


0.4-0.5 


277.6 


8.6 


268.9 


0.5-0.6 


314.3 


21.5 


292.7 


0.6-0.7 


264.8 


35.0 


229.8 


0.7-0.8 


220.8 


37.0 


183.7 


0.8-0.9 


183.7 


32.4 


151.1 


0.9-1.0 


154.7 


29.5 


125.1 


1.0-1.1 


146.4 


28.8 


117.5 


1.1-1.2 


153.8 


32.6 


121.2 


1.2-1.3 


166.8 


48.4 


118.3 


1.3-1.4 


192.0 


101.3 


90.6 


1.4-1.5 


265.3 


220.9 


44.3 


1.5-1.6 


332.2 


319.9 


12.2 


1.6-1.7 


270.7 


268.5 


2.1 


1.7-1.8 


150.7 


150.3 


0.3 



the relation with the nearby LFs based on morphological 
type is not straightforward. 

To simulate an approximate LZT galaxy catalog, we 
must choose a model for the intrinsic LFs in the redshift 
range of the LZT survey (0 55 z 5s 1). Because the LZT 
survey will be most sensitive in the red filters, Table [| lists 
the galaxy LFs </>(Mr) of the existing red-band selected 
surveys which extend at or beyond z ~ 0.5. The deep blue- 



band selected survey by (Heyl et al. 1997) is therefore 
not considered. The LFs in Table 3 are defined by the 
Schechter parameterization ( Schcchtcr 1976| ), 

4>{M)dM = QA\n\Q 4>* \Q- QA{M "- M){a+1) dM 

' 4o°- 4(M *~ M) j , (1) 

where M is the absolute magnitude, and 0*, M* , and a 
are the Schechter parameters. All parameters in Table || 
correspond to the case of a flat Universe with A = and 
H = 100/1 km s" 1 Mpc" 1 . 

The LFs in Table || are binned into early and late-type 
galaxies, and the measured evolution in the Schechter pa- 
rameters is indicated for the samples in which it is de- 
tected. For the CNOC2 survey ( [Lin ct al. 1999| ), the first 
class is their denoted "Early+Intermediate" class, and the 
second class is their denoted "Late" class; these classes 



1.8-1.9 



69.0 



0.0 



are obtained by least-square fit to the SEDs by Coleman 
ct al. (198CD - For the ESS ( |de Lapparent et al. 2002b| ), the 



1.9-2.0 
2.0-2.1 



26.8 
8.4 



26.8 
8.2 



0.0 
0.0 



spectral classification is obtained by a PCA calibrated on 



2.1-2.2 



1.7 



1.7 



0.0 



the Kcnnicutt sample of nearby galaxy spectra ( Kennicutt 
1992| ). [Lilly et al. (1995| ) divide the CFRS sample into a 



nent (0.4%, that is zero stars are giant), and 293 stars in 
the spheroid component (21% are giant, that is 62 stars). 
Before stacking each drawn template to the final catalog, 
we add noise weighted by the expected LZT instrumen- 
tal efficiency (see Sect, gj), The range of median signal (|de Lapparent et al. 2002a|Jb|). For the CFRS frilly et al 

to-nois 



population redder than Sbc, defined as having rest-frame 
[U - V] AB = 1.38 {[U - V] AB ~ [V - I] AB at z ~ 0.5, 
Vab = V, and Jab = I + 0.48), and a population bluer 
than Sbc, defined as the remaining galaxies. In Table g| 
the Schechter LF parameter M* is listed in t he R c Cousins 
band, used by the CNOC2 frin ct al. 1999b a nd the ESS 



ratio in the continuum of the spectra which are 



generated is 6 — 100. 

The final mock catalog contains ~ 3370 stellar tem- 
plates, and faithfully reproduces the giant / dwarf fraction 
and the B — V color distribution of Table |[ 

2.3. Galaxies 

2.3.1. Luminosity functions 

Realistic simulations of galaxy morphology and redshift 
distributions are crucial but difficult because they require 
a prior knowledge of the morphological (or spectral) type- 



1995 ), we use the M* values derived by the authors in 
the Bab band for the galaxies with 0.2 < z < 0.5, and 
we convert them into the R c band using -Bab = B — 0.17 
( Lilly et al. 1995 ), and assuming B — R c = 1.35 for galax- 
ies redder than Sbc, and B — R c = 0.85 for galaxies bluer 
than Sbc ( Fukugita ct al. 1995 ). Note that we have also 



converted the CFRS M* and <j>* from [Lilly ct al. (1995b 
from Hq = 50 km s _1 Mpc -1 used by the authors, to 
H = 100/1 km s _1 Mpc -1 used here. 

For comparison with a red-band selected survey at low 
redshift (0 < z < 0.2), we als o quote in Table | the LCRS 
survey (Bromley et al. 1998). The rQunn magnitudes are 
converted into R c using rcmm — Rc — 0.36 (Fukugita et al 



ical type are only measured locally at z Ss 0.03 (Binggeli 
ct al. 198$ [Marzke et al. 1994] [Jerjen fc Tammann 1997|; 



Marzke et al. 1998: Marinoni ct al. 1999). Recent surveys 



depenccnt LFs. The "intrinsic" LFs based on morpholog- 1995). For this survey, the spectral classification is ob- 
tained by a PCA, but it is not calibrated on spectra of 
known morphology. The grouping of galaxies into early 
and late-types must thus be done arbitrarily. In Table ^| 
we show the LFs obtained by grouping galaxies in clan 
1 + 2 + 3 + 4 into the early-types, and those in clans 5 + 6 
as the late-types. The corresponding listed Schechter pa- 
rameters Mft and a for the early and late- type galaxies are 



to z ~ 0.5 — 1, in which galaxies are separated in 2 or 3 
classes using colors or spectral measures indicate that the 



intrinsic LFs evolve with redshift (Lilly et al. 1995; Heyl 
ct al. 1997; Lin ct al. 1999| ; de Lapparent ct al. 2002a), but 



4 
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Fig. 2. 9 templates of stellar spectra from the Pickles library (1998), containing 131 spectra. The high-resolution 
spectra (R ~ 2000) are shown (grey line) along with the LZT- filtered flux (o). Spectral types are indicated. 



obtained by averaging the Schechter parameters over the 
considered clans. We calculate the corresponding ampli- 
tudes 4>* for the 2 average LFs by adjusting their integral 
in the absolute magnitude interval —23 < Mr < —16.5 
to the sum of the observed numbers of galaxies in the 
considered classes; a total survey area of 462 deg 2 is used 
( [Bhectman et al. 199€ ) . The systematic bias against low 
surface-brightness galaxies which is present in the LCRS 
spectroscopic survey, tends to exclude late- type galaxies. 
This explains the relatively low 4>*(z ~ 0) for the late-type 
galaxies in the LCRS as compared to the other surveys (by 
a factor of 3 to 5). This effect is might also be present in 
the LCRS early-type class (defined as clans 14-2 + 34-4), 
and could explain the ~ 50% lower <f>* compared to that 
for the deeper surveys (ESS and CFRS). 



We also show in Table [| the LFs obtained for a differ- 
ent grouping of the LCRS clans: early-type are galaxies in 
clans 14-24-3, and late- type are galaxies in clans 44-54-6. 
The resulting variations in the Schechter luminosity func- 
tions illustrates the sensitivity of the intrinsic LFs to the 
scheme used for galaxy classification. The systematically 
low <p* for both galaxy classes in either groupings (14-24-3 
and 44-54- 6, 14-24-34-4 and 5 4-6) as compared to 
the deeper surveys CNOC2, ESS, and CFRS (see Table 
||), further suggests that the LCRS suffers from selection 
effects causing an under-sampling of the galaxy popula- 
tions. 

The LFs for early and late- type galaxies at z <; 0.5 
listed in Table || show reasonable agreement among the 
various samples. The dominant source of variation in the 
LFs for each galaxy type are caused by the differences 
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Table 3. Parameters of the Schechter luminosity functions measured by various red-band galaxy surveys with < 
z < 0.5 (see text and Eq. |l]). 



Survey name 
and limits 


Galaxy Type 






5 log h 


a 


0* a 


Evolution 


CNOC2 b 
















0.1 & z <, 0.6 


Earlier than Sbc 


1128 


-20.61 


± 0.11 


-0.44 ± 0.10 


0.023 


M*(z)~M*(z = 0.3) c 
















-0.7(2 - 0.3) 


R c < 21.5 


Later than Sbc 


1012 


-20.11 


± 0.18 


-1.34 ± 0.12 


0.006 


<j>*(z) ~ <p*(z = 0) d (l + 3.17z) 



ESS C 

0.1 <, z <, 0.5 E+SO+Sa+Sb 436 -20.49 ± 0.11 -0.31 ± 0.14 0.031 

R c < 20.5 Sc+Sm/Im 181 -19.84 ± 0.24 -1.63 ± 0.22 0.006 4>*{z) ~ <p*(z = o.i5) d [i + 3.5(2 - 0.15)] 

CFRS' 

0.2 <, z <, 1.0 Redder than Sbc 99 -20.12 ± 0.25 0.00 0.030 

I < 22.5 Bluer than Sbc 110 -20.01 ± 0.25 -1.34 0.010 1-mag brightening in M R 

modelled here as M R (z) ~ 
M R (z = 0.3) c - 2(1 - e ' 3 "*) 

LCRS S 

0^z^0.2 clans 1 + 2 + 3 + 4 16146 -20.42 + 0.02 -0.13 + 0.05 0.018 

J-Gunn < 17.7 clans 5 + 6 2132 -20.38 + 0.08 -1.58 + 0.07 0.002 



LCRS S 

<, z <, 0.2 clans 1 + 2 + 3 12936 -20.50 ± 0.03 0.03 ± 0.05 0.012 

rcunn < 17.7 clans 4 + 5 + 6 5342 -20.32 + 0.06 -1.27 + 0.05 0.006 

1-deg^ mock LZT 

0.2 & z <, 1.2 early-type ~17300 -20.42 -0.16 0.03 

Rc < 23.0 late-type ~12700 -20.18 -1.19 0.01 M* (2) ~ M* (2 = o.5)° 

-2(1 - e - 5 ~ 2 ) 



Notes: 

a <f>* is in units of h? Mpc~ 3 . The quoted uncertainties in cf>* are typically of order of 0.005. This is a lower limit if one takes 
into account the fluctuations caused by the large-scale clustering inside a given survey. For surveys which detect evolution in 
<t> \ this Col, lists the value cj>*(z = 0) 

b Lin et al. (1999); the LFs are measured in the interval —23 < Ma < —17 

M* R 



5 log h 



value given in Col. 
d value given in Col. 

c Lapparent ct al. (2002a the LFs are measured in the interval —23 < Mr < — 16 
illy et al. (1996); the LFs are measured in the interval —23.8 < Mb(AB) < —19 for red galaxies, and - 

f or blue galaxies 

Bromley et al. (1998); the LFs are measured in the interval —23 < Mr < —16.5 



v 

; 1 



-23 < Mb(AB) < -19.5 



in the definitions of the spectral classes among the sam- 
ples. This is reflected in the varying ratio of early to late- 
typ e galaxies among the surveys (see Col. iVg a i in Table 
||). Kochanck et al. (200C| ) recently showed that the mix- 
ing of the morphological classes which is often present in 
spectral classification can cause systematic biases in the 
parameters of the LFs. The varying selection effects from 
survey to survey (such as the mentioned bias in the LCRS 
against low-surface brightness objects) also contribute to 
the differences in the LFs. 



et al. (1995 ) survey, the red galaxies show no or little 



The deep surveys quoted in Table [ 



detect evolution of 
Lin ct al. 19991; pel 



the LF s with redshift (jLilly ct al. 199 
Lappai ent et al. 2002a| ). Understanding the details of this 
evolution is still a matter of debate. Here, we list some sce- 



narii proposed in the corresponding articles. In the Lilly 



density or luminosity evolution in the range ^ z ^ 1, 
whereas the blue galaxies show a luminosity evolution of 
at least 1 magnitude in that redshift range. The CNOC2 
analysis separates the luminosity evolution from the den- 
sity evolution. Early and intermediate-type galaxies show 
a small luminosity evolution in the range 0.1 z ^ 0.7 
(AM* ~ 0.5), whereas late- type galaxies show a clear den- 
sity evolution with almost no luminosity evolution in the 
same redshift range. Finally, for the ESS, an evolution 
in (jf for the late-type galaxies is detected. These vari- 
ous evolutions are listed in Table |3[ in the Col. labeled 
"Evolution" . 

Figures [| ^, and || show for each survey displayed in 
Table || the redshift distributions calculated for the listed 
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0.5 



2.5 




Redshift 

Fig. 3. Galaxy redshift distribution in 40 deg 2 of the sky 
to 17 < i? c < 23 in an Einstein-de Sitter Universe, accord- 



ing to the LFs measured from the CNOC2 data (Lin et al 



1999) and listed in Table 3, The non-evolving and evolv- 



ing distributions for the early+intermediate galaxies are 
shown as a thin dashed line, resp. thick dashed line, and 
for the late-type galaxies, as a thin dotted line, resp. thick 
dotted line. The non-evolving and evolving total distribu- 
tions are shown as thin solid line, resp. thick solid line. 



2.5 



Redshift 

Fig. 4. Galaxy redshift distribution in 40 deg 2 of the sky 
to 17 < R c < 23 in an Einstein-de Sitter Universe, accord- 
ing to the LFs measured from the ESS data flde Lapparent 
et al. 2002a| ,|b|) and listed in Table |. The non-evolving 



distribution for the early-type galaxies is shown as a thin 
dashed line, and for the late-type galaxies, as a thin dot- 
ted line. The evolving distribution for the late-type galax- 
ies is shown as a thick dotted line, and is modeled as 
4>*{z) ~ <p*(z = 0.15) [1 + 3.5(z - 0.15)]. The non-evolving 
and evolving total distributions are shown as thin solid 
line, resp. thick solid line. 



Schechter parameters: without evolution (thin dashed line 
for early-type galaxies, and thin dotted line for late-type 
galaxies) , and with evolution when it applies (thick dashed 
line for early- type galaxies, thick dotted line for late- type 
galaxies). These predicted redshift distributions are cal- 
culated in the case of a flat Universe with A = and 
Hq = lOOh km s _1 Mpc -1 , over the 40 deg 2 planned area 
for the LZT survey, and are extrapolated to the limiting 
magnitude of the LZT survey, namely R c < 23.0. In Fig. 
||, we model the observed evolution of the CFRS blue LF 
with a brightening in M R , defined by the additive term 
m(z) = — 2[1 — exp — (z — 0.3)]; this yields m = 0.0 for 
z = 0.3, m = —1.0 for z = 0.6, m = —1.9 for z = 1.0, 
and m = —2.7 for z = 1.5. For comparison, the redshift 
distributions for the early and late- type LFs for the LCRS 
divided in clans 1 + 2 + 3 and 4 + 5 + 6 arc shown in Fig. 
H (no evolution is considered). 

In Figs. ||, ||, ||, the systematic differences between the 
evolving early- type/red and late- type/blue galaxy redshift 
distributions show common properties. The peak tor the 



late- type distribution is at smaller redshift than that tor 
the early-type distribution because of the combination of 
fainter M * and steeper slope a for the first class of objects. 
This effect is preserved when evolution of the early-type 
and/or late- type population are introduced, despite the 
fact that all evolution scenarii quoted in Table ^ tend to 
shift the peaks of the redshift distributions to higher red- 
shift. The interesting result derived from the comparison 
of Figs. |3|, ||, |j| is that the 3 parameterizations of the LF 
evolution lead to resembling galaxy redshift distributions 
in the range z 1 at the depth of the LZT survey 



Einstein — de Sitter 
Universe 




2.5 



Redshift. 

Fig. 5. Galaxy redshift distribution in 40 deg 2 of the sky 
to 17 < R c < 23 in an Einstein-de Sitter Universe, accord- 



ing to the LFs measured from the CFRS data ( Lilly et al i 
1995|) and listed in Table §. The non-evolving distribution 



for the red galaxies is shown as a thin dashed line, and for 
the blue galaxies, as a thin dotted line. The evolving distri- 
bution for the blue galaxies is shown as a thick dotted line, 
and is modeled as M* R {z) ~ M* R {z = 0.3) - 2(1 - e 3 " 2 ) 
(see Table ||). The non-evolving and evolving total dis- 
tributions are shown as thin solid line, resp. thick solid 
line. 
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Fig. 6. Galaxy redshift distribution in 40 deg 2 of the sky 
to 17 < R c < 23 in an Einstein-de Sitter Universe, accord- 



ing to the LFs measured from the LCRS data (Bromley 
et al. l |998| ) and listed in Table || The distribution for the 
galaxies in clans 1 + 2 + 3 is shown as a thin dashed line, 
and as a thin dotted line for galaxies in clans 4 + 5 + 6. 
The thin solid line shows the total distribution. 



(Rc < 23.0): the redshift and amplitude at the peak, as 
well as the high redshift fall-off of the redshift distribu- 
tions are similar. In contrast, the LCRS produces a sys- 
tematically low redshift distribution at this depth, with a 
peak at ~ 600 galaxies, whereas the no-evolution curves 
for the deeper surveys shown in Figs. ||[ ||, ||, all peak in 
the range ~ 900 — 1200 galaxies; as already mentioned, 
this may arise from selection effects in the LCRS. 

We therefore choose to adopt for all mock catalogues 
considered here the following LF Schechter parameters: 
-20.42, a = -0.16, (f>* = 0.03 for early-type galax- 



M* R 
ies; 



Ml 



-20.18, a 



-1.19, 



= 0.01 for late-type 
galaxies (also listed in Table ||). These chosen a for the 
early-type galaxies, and the M R and cj>* for both galaxy 
types are within the range of values measured for the 3 
i?-selected surveys in Table |: the CNOC2, the ESS, and 
the LCRS. The slope a = —1.19 for the late-type galax- 
ies is natter than the flattest measured slope (from the 
LCRS, clans 4 + 5 + 6), in order to limit the number of 
galaxies to be included in the mock LZT catalogues, and 
thus the computing time for the PCA; this value is how- 
ever only 1.25cr flatter than the slope of the LF for the 
CNOC2 late-type galaxies. Having a steeper slope for the 
late-type galaxies would have a weak impact on the re- 
sults presented here. Note that the chosen values of <f>* 
for the early and late-type LFs are obtained by using an 
early-type to late-type </>* ratio of 3 (as in the CFRS) 
and by normalizing the total number of galaxies per deg 2 
with R c < 23 to the observations: numerous studies give 
galaxy counts in the R band, and we use a typical value of 



~ 30, 000 galaxies/deg 2 ( [Roche et al. 1996] ; [Metcalfe et aL| 
20010 
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Fig. 7. Predicted redshift distribution for the future LZT 
survey in 40 deg 2 of the sky to 17 < R c < 23 in 
an Einstein-de Sitter Universe. The redshift distributions 
for the non-evolving early-type, late-type galaxies are 
shown as a thin dashed line, resp. thin dotted line. The 
corresponding Schechter parameters are M R = —20.42, 
a = —0.16, 4>* = 0.03 for the early-type galaxies, and 
M* R (z < 0.5) = -20.18, a = -1.19, (/)* = 0.01, for the 
late- type galaxies. The expected distribution for the evolv- 
ing late- type galaxies, modeled by an M R brightening de- 
fine by M* R {z > 0.5) ~ M* R (z = 0.5) - 2(1 - e a5 " z ) is 
shown as a thick dotted line. The thin, resp. thick continu- 
ous lines show the sums of the curves for the non-evolving 
early-type galaxies and the non-evolving, resp. evolving 
late-type galaxies. The evolving sum is subsequently used 
for generating the mock LZT catalogues. 



Because the LZT survey will have a depth comparable 
with that of the CFRS, we adopt an evolution of the LF 
resembling that for the CFRS, but which better matches 
the evolution at z > 1 (as shown in Fig. [t], a non neg- 
ligeable fraction of galaxies, 27%, will have z > 1 in the 
LZT survey). Although the evolution of the galaxy LFs 
is poorly known beyond z = 1, photometric redshifts ap- 
plied to the Hubble Deep Field do provide a general trend 
for the evolution of the "total" LF: a steepening of the 
slope a from —1.3 (z ~ 0.5) to —2 (z ~ 3), and a one 
magnitude brighteni ng of M% between z ~ 1 and z ~ 3 
QSawicki et al. 1997| ; |Takeuchi et al. 2000| ). We therefore 
conservatively assume no evolution of both the early and 
late-type galaxy LFs in the range < z < 0.5, and add 
a brightening term m(z) = — 2[1 — exp — (z — 0.5)] to M R 
for the LZT blue galaxies with z > 0.5. The brighten- 
ing term m(z) gradually changes M R by m = —0.83 at 
z = 1, m = —1.0 at z = 1.19, m = —1.26 at z = —1.5, 
m — —1.55 at z = 2, and m — —1.8 at z — 3; it asymp- 
totically reaches its ceiling of m = — 2 at z > 3. The 
corresponding redshift distribution for the evolving late- 
type galaxies in the modeled LZT survey is given in Fig. 
^ (thick dotted line) ; it is combined with the non-evolving 
early-type redshift distribution (thin dashed line) to de- 
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rive the sum of the 2 populations in the evolving model 
(thick continuous line). This summed redshift distribu- 
tion is subsequently adopted for the LZT mock surveys 
analyzed here. For comparison, the corresponding non- 
evolving late-type galaxies distribution (thin dotted line) 
and non-evolving total distribution (thick continuous line) 
are also shown in Fig. [?]. 

Over the planned area of 40 deg 2 for the LZT survey, 
over ~ 1, 000, 000 galaxies are expected. Because the PCA 
is computer-time consuming, we limit the size of the LZT 
mock catalogues used in the following analyses to 1 deg 2 . 
These calatogue therefore include ~ 30,000 galaxies, and 
the number of galaxies in each class are those listed in 
Table H 

2.3.2. Galaxy spectral energy distributions 

The galaxy templates are extracted from stellar synthe- 
sis libraries. Figure || shows 24 templates of E/S0, Sa/Sb, 
Sc/Sd, and Sm/Im galaxies for 3 metallicities and 2 Initial 
Mass Functions (IMF) kindly provided by S. Chariot, and 
calculated from th e version GISSEL95 of t he GISSEL 
evolutionary code ( Bruzual fc Chariot 1993] ). For early- 
type galaxies, higher metallicities flatten the slope of the 
continua. Changing the IMFs induces no effect on the 
templates (Fig . frame a). For late-type galaxies, the 
Salpeter IMF ( |Salpctcr 1955) tends to produce bluer ob- 
jects than the Scalo IMF ( [Miller fc Scalo 1979|) , and the 
metallicity effect is small (Fig. |8| frames b to d). The 
Sa/Sb templates using a Salpeter IMF are very similar 
to the Sc/Sd templates using a Scalo IMF, and at the 
resolution of the LZT survey, they are nearly identical. 



Alternatively, the PEGASE package flFioc fc Rocca- 



Volmerange 1997[ ) allows us to generate a set of solar 
metallicity spectra with different ages, different stellar for- 



mation rates ( SFR) and different IMF taken from Rana 
fc Basu (1992| ). Details can be found in Fioc & Rocca- 



Volmerange (1997). Included in the PEGASE package is 
an atlas of templates of eight galaxy types. For the vari- 
ous spectral types (E, SO, Sa, etc.), the atlas provides 65 
templates from ages of 1 Myr to 16 Gyr; each age sequence 
is normalized so that the 13-Gyr templates fit present day 
SEDs of observed galaxies with solar metallicity. 

Even if true spectra cover a large range of metallic- 
ities and ages, the resolution R ~ 40 of the LZT survey 
does not allow us to distinguish between a PEGASE young 
Sa galaxy template and PEGASE old Sb, Sc, Sd or Im 
galaxy template. Therefore, we include in the mock LZT 
catalog only templates showing significant differences and 
instead of using the ages given by PEGASE as true in- 
dicators of the evolution of galaxies with redshift, we use 
them as indicators of the morphological sequence. Hence, 
for elliptical galaxy SEDs, we use the 23 templates of the 
PEGASE atlas called E13 and older than 1 Gyr (the tem- 
plates have a short SFR, followed by passive evolution). 
For spiral galaxy SEDs, we use the 31 templates of the 
PEGASE atlas called Sal3, older than 350 Myr (we use a 



constant SFR for spiral galaxies). Figure || shows several 
Sal3 templates from the PEGASE library, shown at vari- 
ous ages. As already mentioned, the 13-Gyr-old template 
is fitted to a local Sa galaxy spectra with solar metallicity. 
We interpret the ages sequence given by PEGASE Sal3 
templates as follows: late-type galaxies (Sc, Sd, Im) are 
modeled by young Sal3 galaxy templates (with ages < 3 
Gyr, 15 templates), and early-type galaxies (Sa, Sb) by old 
Sal3 galaxy templates (with ages > 3 Gyr, 16 templates). 

There are notable differences between PEGASE and 
GISSEL elliptical galaxies in the UV, whereas spiral tem- 
plates look alike in both models. We also emphasize that 
both GISSEL and PEGASE packages rely on the stellar 
library of Pickles which is noticeably deficient in AGB 
stars SEDs. AGB stars are still poorly known but repre- 
sent a non-negligeable part of the total integrated flux of 
galaxies. 

The mock LZT galaxy catalogues are built using the 
(evolving) redshift distributions in Fig. [?]. Early-type 
galaxies are randomly extracted from the 12 GISSEL 
E/SO/Sa/Sb templates, the 23 PEGASE E13 templates, 
and the 16 Sal3 templates older than 3 Gyr. Late-type 
galaxies are randomly extracted from the 12 GISSEL 
Sc/Sd/Sm/Im templates, and the 15 PEGASE Sal3 tem- 
plates younger than 3 Gyr. The number of early and late- 
type galaxies are those plotted in Fig. ^ as non-evolving 
early-type and evolving late- type redshift distributions for 
the LZT survey (see also Table ||). 

2.3.3. Emission-line galaxies 

We do not model the emission lines in the galaxy SEDs, 
because in the medium-band filters used by the LZT, only 
the brightest emission lines will be detected. At first order, 
an emission line will be detected only if 



Wmte 



> 



Threshold 



S/N 



(2) 



where W\\ nc is the line equivalent width, Waiter is the fil- 
ter bandwidth, Threshold is the detection threshold in 
units of standard deviation, and S/N is the signal-to- 
noise ratio. For Threshold = 1, S/N = 3, and a typi- 
cal Wfiitor = 150A, the emission line must have Wii nc > 
50A. Only QSOs and Seyfert galaxies reach this level of 
emission. 

We emphasize that the analysis of the QSO sub-sample 
included in the LZT mock catalog (see next Sect.) demon- 
strates that strong emission lines contribute to improving 
the object classification and the redshift determination 
which are obtained by the PCA. As seen in Fig. and 
described in Sect. 3.1 below, the PCA sequence of QSOs 
is clearly separated from the blue part of the stellar se- 
quence, and this is caused by the strong emission lines 
present in the QSO SEDs. 

Although Seyfert galaxies only represent a small frac- 
tion (5%) of the galaxy populations at low redshift 



(Reichert 1992), the fraction of galaxies with strong emis- 
sion lines may be larger at z 0.5. However, we consider 
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Fig. 8. Galaxy template spectra from GISSEL (Bruzual & Chariot 1993). In each frame, the Initial Mass Function 
(IMF) and metallicity of the spectra vary from top to bottom as follows: Salpeter IMF, with 20% solar, 40% solar, 
and solar metallicity; Scalo IMF, with 20% solar, 40% solar, and solar metallicity. 



that it is not necessary to include them in the mock LZT 
catalogues, as they have similar SEDs to QSOs, and would 
therefore make a negligeable change to the PCA eigenba- 



sis. Seyfert galaxies and other galaxies with strong emis- et al. 1 



shift would be measured with similar accuracy as for the 
QSOs (see Sect. ||). They could also be directly identified 
from their S EDs using a break- finding algorithm ( Cabanac 
, or a cross-correlation analysis (as shown by 



sion lines would deviate from the locus of normal galaxies 
in the PCA, and would thus be easily identified; their red- 



preliminary tests performed by R. Cabanac). 
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Fig. 9. Sal3 galaxy template spectra from PEGASE (Fioc & Rocca-Volmerange 1997). The age of the spectra are 
given in million years. The metallicity is adjusted to have a solar value for the 13-Gyr-old spectrum. 



2.4. Quasi-stellar objects 

The third kind of objects included in the simulations are 
the quasi-stellar objects (QSO). We use the preliminary 



resul ts of the on-going 2dF QSO Survey (Boyle et al 



200CJ) . The 2dF QSO survey has been optically selected in 
the U, Bj and R bands from UKST photographic plates. 
The Bj QSO LF is found to follow a pure luminosity evo- 
lution ( Boyle et al. 2000 ) which can be modeled by 



4>(Lb, z)dL = — 



AL 





3.6 

+ 


L B 


1.8 


W B {z)_ 









(3) 



with 



log 



L* B (0) 



= lAz-0.27z 2 



or equivalently 



We assume the same analytical description for the R QSO 
LF, and choose to adopt for the simulations M R ~ —21 
and (jf R = 10 -6 Mpc -3 mag -1 , a ~ 35 % larger value tha n 
calculated from the Bj 2dF survey (Boyle et al. 2000). 



Equation 3 can be rewritten in terms of absolute magni- 
tude M R : 



4>(M R ,z)dM R 



X 



0Aln(W)$ R dM R 

- X 3.6-l +X 1.8-l 
= 10 Q.4[M£(Q)-Mb]- 



1Az+Q.27z z 



(6) 

We assume that Eq. | is a reasonable prediction of the 
LZT QSO sample and we extrapolate the QSO LF to the 
LZT apparent magnitude limit of R c < 23. We also set a 
bright cut off at R c = 15. The resulting redshift distribu- 
tion in the interval < z < 4 is shown in Fig. [To| . 

Figure Wn shows a composite QSO template derived 



(4) from the LBQS sample (Francis et al. 1991), and a sample 



of 101 QSOs observed in the UV using HST (Zheng et al 



1997). Because the QSO template provided by Francis et 
al. is bounded at 6000 A, and the LZT filters extend out to 



M B (z) = M|>(0) — 2.5 (1.4.2 — 0.27z 2 ), (5) 10 4 A, we approximate the missing continuum by a power- 
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cal H a /H0 line ratio of ~ 4, obtained in models of broad 



Einstein — de Sitter 
Univorcc 




line regions (Ostcrbrock 1989). A new composite spectra 
has been made available by the SDSS consortium (Vandcn 



Redshift 

Fig. 10. Redshift distribution for QSOs with 15 < R c < 
23 in an Einstcin-de Sitter Universe, using the 2dF QSO 
survey LF flBoylc ct al. 2000[). 
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Fig. 11. Composite QSO spectrum at z — ( Francis et al 
199l[)~|The wavelengths of the centers of the LZT filters 
are overlaid (o). At red wavelengths, the spectrum is ex- 
trapolated by a power-law from 6000 to 10 4 A, to which 
is added a synthetic H a line. In the UV, t he spectrum 
is completed with observations from HST ( Zheng et al 
199?Q 



law in the range 6000-10 000 A. In the blue, the composite 
spectrum is bounded at 800 A. In order to define the U 
and blue LZT magnitudes for all QSOs with z < 4, we ex- 
trapolate the spectrum to 300 A at a constant flux. Note 
that because of Lyman a absorption, the continuum of 
the spectrum is poorly defined at wavelengths bluer than 
~ 1200 A (see vandcn Berk et al. 2001); this choice only 
affects QSOs with z > 2.75, among which, only those with 
weak emission lines would be "lost" in the star sample. We 
also add a synthetic H a emission line whose intensity is 
related to the intensity of the Hp emission line already 
present in the composite spectrum, according to the typi- 



Bcrk et al. 2001) while the present article was in the ref- 
ereeing process. Because both the emission lines and the 
continua of the 2 composite spectra are remarkably simi- 
lar, we did not update our simulations to include the SDSS 
composite spectrum: the wide spectral range of the SDSS 
spectrum would make no improvement to our analysis, as 
the range covered by the 2dF composite spectrum used 
here is sufficient to describe the full wavelength interval of 
the LZT filters for QSOs with < z <, 4. 

The integrated QSO number count per deg 2 is extrap- 
olated f rom the differentia l number counts in the 2dF QSO 
survey ( Boyle et al. 2000| ). To a limiting apparent magni- 
tude of R < 23, the integrated count of QSOs per square 
degree is ~ 130. The mock LZT catalogues are generated 
for 1 deg 2 , hence each simulation is obtained by randomly 
drawing 130 QSO templates with the same redshift dis- 
tribution as in Fig. 10, and subsequently adding noise ac- 
cording to the LZT efficiency (see the next Sect.). The 
range of signal-to-noise ratio in the continuum of the syn- 
thetic QSO spectra is 3 — 100. 

Because the number of QSOs is relatively small in our 
mock catalogues we decided to use only the composite 
SED of Francis without slope variations. Nevertheless, it 
is likely that the slopes of the continuum of real QSO 
SEDs vary around the mean slope of the composite SED 
of Francis. The effect of such a variation on the PCA would 
be to spread the locus of QSOs from a line to a surface; the 
area of this surface would be related to the standard devi- 
ation in the variation of the slope. This variation could 
degrade the identification of QSOs, because the flatter 
the slopes, the higher the similarities between QSOs and 
emission-line galaxies. On the other hand, the measured 
redshift accuracies (Sect. 5) should not be affected by this 
simplification. 



2.5. Noise 

Photon noise and detector read-out noise are added to 
the SED of each type of object according to the response 
curve T(A) of the Large Zenith Telescope (LZT), defined 
as the product of the detector sensitivity curve CCD(X) 
by the transmission curve of the telescope/instrument op- 
tics O(A), and the sky transmission /(A): 



T(A) = I(X) O(A) CCD(A); 



(7) 



For an object with intrinsic flux Fq(A), the final flux F(X) 
obtained after "observing" the object with the LZT tele- 
scope+instrument+detector and correcting it using an ab- 
solute flux calibration is 



F(X) = F (A) 



gauss 
S/N(X) 



(8) 
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Sky spectrum 



S/N{\) 
5(A) 



9(A) 



F (X) 



F (X)+A src sky{\) 
10 20 he 



A BIC RON 2 g(X) 
A pix T(A) JV night 



A r AA At A, 



(9) 



(10) 



gauss is a gaussian random generator, with a null 
mean and a root-mean-square dispersion of 1, S/N(X) 
is the "observed" signal-to-noise ratio in the spec- 
trum F(X); sky(X) is a composite sky spectrum, us- 



ing Kitt Peak night sky spectrum (Massey & Foltz 
2000 |)7|Mauna Kea KECK LRIS OH emission lines at- 



las (http://www2.keck.hawaii.edu), and GEMINI near 



IR modeled continuum (http://www.us-gemini.noao.edu). 
Figure [l^ shows the composite high-resolution sky spec- 
trum and its medium-band counterpart using the LZT 
filters. In Eq. ^, A src is the area of the observed object in 
arcsec 2 on the detector, yl p i X = 0.245 arcsec 2 is the area 
of one CCD pixel, RON = 11 e~ is the CCD readout 
noise. The function g(X) in Eq. [To] is the flux expressed in 
Joule/second/metcr 3 corresponding to one photon having 
the central wavelength A c of the LZT filter considered (see 
Fig. [j]), arriving on the detector per exposure time At and 
per wavelength interval A A (of the considered filter), given 
the area of the LZT mirror A m ; rror = 28.3 m 2 (wavelengths 
are expressed in meters, time in seconds); he = 1.992~ 25 
J-m is the product of the Planck constant by the speed of 
light. Because the LZT is operated in drift-scan mode, the 
exposure time is fixed to At = 65 sec. The exposure time 
is effectively increased by observing a given region of the 
survey in a given filter during several nights A^ght (see 
Eq. 9). 

The number of optical surfaces in the tele- 
scope/instrument combination is 11: the mirror (80% re- 
flectivity), and 4 lenses plus the CCD window (98% trans- 
mission is assumed for each of the 10 surfaces). For sim- 
plicity the transmission is supposed to be achromatic. The 
LZT total efficiency curve T(A) has a bell-shape between 
3000A and 10000A, with a flat maximum transmission of 
0.65 running from 5000A to 8000A. Figure |l| shows the 
S/N(X) calculated using Eq. | for an 05V star, a GOV 
star and an M6V star, with their lowest signal-to-noise 
ratio set to 3.35: this corresponds to the overall signal-to- 
noise ratio obtained from 5 contiguous pixels having each 
a signal-to-noise ratio of 1.5. Note that we choose such 
a low detection level per pixel in order to test the PCA. 
Given the LZT median seeing of 0.9 arcsec FWHM (Full- 
Width-Half-Maximum), we do not expect stars to be less 
extended than 5 pixels, as it corresponds to the area of 
a disk of 1.2-arcsec diameter projected onto CCD pixels 
with an area A p ^ = 0.245 arcsec 2 . We thus adopt the lim- 
iting signal-to-noise ratio of 3.35 as our detection thresh- 
old for the LZT mock catalog. For galaxies at z ~ 0.5, a 



typical FWHM of 3 arcsec (see Sect. 4.1) yields an area 
of 28.8 pixels, and a detection threshold of S/N ~ 1.5 per 
pixel yields an overall S/N ~ 8. 
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Fig. 12. Composite sky spectrum using Kit t 
Peak night sky spectrum (Massey & Foltz 2000), 
Mauna Kea KECK LRIS OH emission lines atlas 
( |http://www2.keck.hawaii.edu|), and GEMINI near IR 



modeled continuum (http://www.us-gemini.noao.edu). 
The filled diamonds are the LZT U+medium-band filter 
fluxes used for noise computations. 



We also define the overall signal-to-noise of each spec- 
trum as the median value of SJN(X) over the 41 LZT 
filters. For the spectra in fig. [f|, these median S/N are 
44 (05V), 14 (GOV), and 122 (M6V). Figure |l| there- 
fore illustrates that very blue or very red spectra must 
have a high median S/N to be detected over the entire 
range of the LZT filters. To simplify the analysis and in- 
terpretation, the SEDs of all classes/types of objects in a 
given mock catalog are set to the same median signal-to- 
noise ratio. Mock catalogues are generated such that only 
objects with a lower filter above the threshold limit of 
S/N(X) = 3.35 and median signal-to-noise ratio equal to 
a given value are included: any object having at least one 
filter with a S/N(X) < 3.35 is therefore not included in the 
catalog. Catalogues with median signal-to-noise ratios of 
100, 20, 10, and 6 are used for the analysis reported here. 
In the following Sects., the labels S/N always refer to the 
median signal-to-noise ratio of the spectra in the mock 
catalogue considered. As shown in Fig. |l3|, we emphasize 
that at a median S/N ratio of 100, the bluest objects will 
have a signal-to-noise ratio of ~ 10 in their reddest filters, 
whereas the reddest objects will have a signal-to-noise ra- 
tio of ~ 3 in their bluest filters; for fiat-spectrum objects, 
the range of signal-to-noise ratio described by the spectra 
will be narrower. 



3. Methodology 

3.1. Principal Component Analysis 

For a detailed description of the PCA, the reader is 



referred to the books of 


Murtagh & Heck (1987 


) or 


Kendall (1972), and to the seminal paper of Connolly 
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Table 4. Eigenvalues jj of a typical PCA derived eigen- 
basis e from a LZT mock catalog which contains a mix 
of 3368 stars, 11600 galaxies and 170 QSO SEDs with 
S/N = 100, as described in Sect. 2. The first 3 eigenval- 
ues contain 98.8% of the descriptive power. 
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Fig. 13. Signal-to-noise distributions (S/N[X\) of 3 stellar 
spectra with types 05V, GOV, and M6V, with median 
S/N of 44, 14, and 122 resp.. The lowest S/N is set to 3.35 
(dotted line): its corresponds to the signal-to- noise ratio 
of a punctual source extended over 5 contiguous pixels, 
with a detection threshold of 1.5 per pixel. 



et al. (1995b). In this section we outline the results of 
PCA and develop the classificatio n method. 
Following |Connolly et al. (1995b| ) , consider a set f oi N 
vectors U of M elements i G {1 . . . N}, A G {1 . . . M }, 
normalized to have unit scalar products [fi ■ fi] 1 / 2 = 1. 
In our case, M is the number of filters: M = 41. The 
PCA derives a set of min(N,M) orthogonal eigenvectors 
ej (that is M — 41 eigenvectors in the present case, as 
N ^> M), using criteria of decreasing maximum variance 
of the spectra when projected onto the eigenvectors. Each 
vector fi can be written as a linear combination of ey. 



A I 



(11) 



jth eigenvector in the ith vector. The first eigenvector e\ 
is the mean vector over the f\i . Each weight yn measures 
how much f\i is similar to the mean vector, i.e. gives its 



projection onto the mean vector; the second vector e2 lies 
in the direction of highest variance orthogonal to e\ etc. 

The main advantage of the PCA is that when the vec- 
tors fi are correlated (as it is the case for astronomical 
SEDs), most of the discriminatory power of the linear com- 
bination (Eq. |i"l|) is carried by the first few eigenvectors, 
and the high-order eigenvectors carry mostly the noise of 
the spectra. The PCA therefore provides a powerful fil- 
ter for the set / (see Galaz and de Lapparent, 1998). For 
illustration, Table || shows the eigenvalues jj of a PCA 
performed on one mock LZT catalog described in Sect. 2. 
As shown by Connolly et al. (1995b), each eigenvalue 7j is 
the contribution of the corresponding eigenvector ej to the 
variance of the set /, and therefore describes the relative 
power of each eigenvector in the dataset (one can intu- 
itively realize that if all vectors have similar directions, 
they can be described by a small number of components). 
The power Pj carried by each eigenvector ej can be mea- 



sured as 



7j 



(12) 



where ija, denoted cigcncomponcnt, is the weight of the 1999 ) 



Table |] therefore shows that the first 3 eigenvectors ex, e2, 
ez carry 87.6%, 9%, 2.2% resp. of the descriptive power; 
the sum of these 3 contributions yields 98.8%, and adding 
e4 increases the summed contributions to 99.3%. This 
means that the first 3 vectors actually carry most of the 
information and, to a good approximation, the weights of 
the other eigenvectors may be neglected. This is the rea- 
son why only the first 3 or 4 eigencomponen ts are usually 
kept to describe a set of galaxies at z = ( Ronen et al 



Our case is more complex because we have to analyze 
a catalog containing different classes of objects, which in 
addition span different redshift intervals, hence showing 
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Fig. 14. PCA eigencomponents yi versus y\ a), j/3 versus y\ b), 7/4 versus y\ c), j/3 versus yi d), j/4 versus yi e), j/4 
versus 7/3 f), and a plot of the first 4 eigenvectors g) for a catalog containing 131 stars (*), 7800 galaxies (•) and 80 
QSOs (o) at a median signal-to- noise ratio of S/N = 100 (see Sect. ||). The 3 classes of objects occupy different regions 
of the graphs. Galaxies and QSOs show clear redshift sequences. The loci of the various stellar types are indicated in 
each graph. 



a wide variety of spectral energy distributions. We there- 
fore need to use all significant information. Several tests 
were performed, with PCAs having 4, 5, 6, 7, 8, 9, 10 and 
20 eigencomponents. Fewer than 5 components is largely 
insufficient to describe the LZT mock samples (see Sect. 
6.1). One must reach 9 components to restore the full de- 



tails of the LZT mock samples. We therefore choose to 
perform the analysis of the LZT mock catalogues with a 
10-eigencomponent PCA, as the remaining power after the 
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10th component just reaches below 0.1% (see Tableland 
Eq. |l^). With 10 eigencomponents, Eq. |ll| then becomes 



Xi 



10 

.7=1 



(13) 



Figure |lj plots the first 4 weights or PCA eigencom- 
ponents j/i, 2/2, 2/3, and 2/4 resulting from a PCA on the 
same mock catalog that used to obtain Table ||. The last 
graph shows the corresponding first 4 eigenvectors; the 
corresponding relative powers which they carry are indi- 
cated fei the graph. Stars (stars) galaxies (dotted lines) 



and QSOs (open circles) occupy well-defined regions of 
the 4-D space described by the first 4 eigenvectors ei, g2, 



eter is related to the color of the objects, i.e. the slope 
of the continua, and the other is an index of the inten- 
sity of the emission lines. Roncn ct al. (1999| ) show by 
using stellar synthesis galaxy template spectra that one 
can track down some properties such as the age or the 
metallicity. The weakness of the method of Ronen et al. 
is that when the stellar synthesis models reproduce the 
observed spectra but are physically wrong, the derived 
physical properties are erroneous. A successful approach 
should include a direct calibration of the PCA on observed 
spectra with known types, ages, and metallicities (Galaz 
fc de Lapparent 1998| ). 



63, and (see frames b & d). As shown by Connolly et al 
(1995b) and palaz fc de Lapparent (1998 ), using the first 



3 eigencomponents, the stellar spectral sequence (shown 
with asterisks) describes an arc, with later type being at 
lower values of 2/2- The stellar spectral types of selected 
stars are indicated in each graph. O, B, and A stars have 
positive values of 2/2, whereas M stars have negative val- 
ues of 2/2- The other spectral types fill the sequence. The 
galaxies (plotted as dots) describe a 2-D surface, starting 
from the stellar sequence at low redshift, and extending 
away from it along the paths generated by a redshift step 
of 0.01. The higher redshift considered for the galaxies is 
~ 2. The QSOs (open circles) describe a linear sequence 
defined by their redshift (1 < z < 4). The QSO sequence 
is distinct from the stellar sequence at low redshift, and 
merges with it at z ~ 3 (see Sect. 4.2); at this redshift, a 
QSO cannot be distinguished from an F star. 

Therefore, the PCA not only provides a spectral classi- 
fication within each class of object (stars, galaxies, QSOs) 
but also allows a classification scheme by which the 3 
classes of objects can be distinguished. The definition of 
the different spectral types for each class of object can be 
done by parameterization of the 1-D or 2-D surfaces they 
describe. This is developed in Sect. 3.3. 

3.2. The spectral classification 

Because the PCA is a non-parametric approach, the eigen- 
vectors derived for one catalog are specific and maybe in- 
appropriate for another catalog. In order to use the PCA 
for spectral classification, one must relate the internal cor- 
relations outlined by the PCA with the physical proper- 
ties of the objects. The canonical but subjective method to 
achieve this purpose, is to create realistic mock catalogues 
of templates and apply the PCA to them. Previous works 



by Connolly et al. (1995a); Connolly & Szalay (1999]) 


Galaz & de Lapparent (1998 


) and 


Roncn et al. (1999) 



establish the efficiency of the PCA to produce an inter- 
nal classification scheme related to physical properties of 
galaxies. For instance, Galaz & de Lapparent (1998) use 
the first 3 eigencomponents yn, j/i2> and 2/13 as Cartesian 
coordinates of a 3-D space, and convert them into the 
2 angles defining the corresponding spherical coordinates 
which are in turn used for the classification: one param- 



As pointed out by Connolly et al. (1995b), the PCA 



eigenvectors and corresponding eigencomponents are de- 
termined by the relative numbers of the different types of 
objects. Connolly et al. (1995b| ) introduce a morphological 
type normalization to account for this relative proportion 
of objects when catalogues with unrealistic proportions 
are used. Here, we prefer the direct approach of using re- 
alistic mock catalogues (see Sect. ^) which will be directly 
comparable to the LZT catalog. 

The PCA applied to the mock LZT catalogues is used 
here to extract the relevant information for classifying 
objects and measuring their redshifts (for galaxies and 
QSOs). We choose a simple approach. Once the principal 
eigencomponents of the mock catalog are measured, we se- 
lect the first 10 eigenvectors and define a 10-dimensional 
eigenbasis, in which each class of object occupies a given 
locus, defined by it spectral type among its class, and 
its redshift for galaxies and QSOs. As the paths followed 
by the galaxies and QSO are monotonic functions of red- 
shift, the PCA provides a template-independent method 
for measuring redshifts. Except for the redshift/spectral- 
type degeneracies discussed in Sect. 6.1, similar types of 
objects at similar redshifts (for galaxies and QSOs) tend 
to be nearby in the space defined by the 10-dimcnsional 
eigenbasis, thus providing a unique parameterization of a 
given SED. In the following, we describe how the object 
classes and types are defined and how the redshifts are 
measured. 



3.3. Definition and measurement of object/spectral 
types 

We define object classes (star, galaxy, QSO) and types 
(O to M for stars, E to Irr for galaxies) by assigning to 
each template a number T. For stars, the 131 Pickles tem- 
plates spanning the range of stellar types O to M and lu- 
minosities I to V are ordered from blue to red and are 
assigned types T <E {1, 2, 131}. Galaxies are also or- 
dered from blue to red as follows: the 12 GISSEL late- 
type templates (see Sect. 2.3.2); the 27 PEGASE late- 
type templates; the 12 GISSEL early-type templates; the 
27 PEGASE early-type templates; these 78 spectra are as- 
signed types T £ {132, 133, 209}. QSOs are all assigned 
T = 210, as only one type of spectrum was used. 
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Given the SED o\ of an object. The projected compo- 
nents on the eigenbasis o? are simple scalar products: 



41 



(14) 



and the weighted distance d from the object to any other 
object i of the 10-D space is 



10 

E% ya (°?-Wi) S 

3=1 



1/2 



(15) 



where yij are the eigencomponents defined in Eq. [ll] and 
|l3| , ej are the corresponding eigenvectors of the PCA per- 
formed on the mock catalog. Weighting the eigencompo- 
nents with their eigenvalues jj is used to account for the 
increase in the noise from yn to yno- The object type 
^object is simply defined as the arithmetic mean of the 
types T Vi of the r] nearest neighbors: 



- 1 - 

^object ~ ^ / , Ty. 

vf=1 



(16) 



As a test, we calculate for our mock catalogues of 2993 
galaxies with median S/N = 20 and 10, and for those with 
2095 galaxies with median S/N = 6, the type T ob j oct given 
by Eq. [l| for 77 = 1, 2, 3, 4, 5, and 10. The resulting errors 
in the object classes and types are listed in Table |: Cols. 
±5, ±10, ±15 and > 15, give the number of galaxies for 
which T bject differs from the true type T by 5, 10, 15, 
and more than 15 type units; Cols, star and QSO give the 
number of galaxies misclassified as star, or QSO. Table || 
demonstrates that in all cases, the most accurate results 
are obtained for 77 = 1, i.e. the nearest neighbor is al- 
ways the best choice. The efficiency of the classification is 
further described in Sect. [|. 

Note that in the nearest neighbor analysis used here, 
the effect of the input S/N onto the eigencomponents j/i 
is not accounted for. Table || suggests that the effect on 
the classification would be negligeable (as almost no errors 
occur even for large values of rj). The effect on the type 
and redshift deter mina tion would be larger, and we refer 
the reader to Sect. 6J for a more general discussion. 



3.4. Measurement of redshifts 

Because we apply the PCA to the observed SEDs, it is pos- 
sible to measure redshifts directly with a technique similar 
to the classification method (Sect. ^2). In the 10-D space 
defined by the first 10 eigenvectors, galaxies and quasars 
follow redshift sequences. We define the redshift z Q of an 
unknown object as the average redshift z of the 77 nearest 
neighbors from the mock catalog which lie inside a radius 
r v and have type T ob j ct ± 5: 



1 v 



Table 5. Error in the determination of the types for 2993 
galaxies at median S/N = 20, 10, and 2095 galaxies at 
median S/N = 6, for different values of 77 (see text for 
details). 

Median S/N = 20 



V 




type error # 




class 


error # 




±5 


±10 


±15 


> 15 


star 


QSO 


1 


2796 


14 


20 


133 








2 


2769 


23 


22 


149 








3 


2756 


24 


33 


150 








4 


2746 


38 


22 


157 








5 


2726 


56 


27 


154 








10 


2703 


50 


50 


160 








Median S/N = 10 


V 




type error # 




class 


error $ 




±5 


±10 


±15 


> 15 


star 


QSO 


1 


2509 


76 


32 


374 








2 


2469 


75 


43 


404 








3 


2455 


86 


71 


379 








4 


2433 


125 


40 


393 








5 


2418 


130 


65 


378 








10 


2400 


128 


76 


387 








Median S/N = 6 


V 




type error # 




class 


error # 




±5 


±10 


±15 


> 15 


star 


QSO 


1 


1594 


84 


85 


332 








2 


1577 


114 


73 


331 








3 


1587 


105 


88 


315 








4 


1578 


136 


61 


320 








5 


1590 


126 


62 


317 








10 


1584 


143 


52 


316 









(17) 



r v is defined by rj and has the same unit as d (Eq. |l5| ), 
and therefore, as the eigencomponents yi (see Eq. [ll] and 
Fig. [ll]). The value of r v is usually small: for 77 = 10, we 
estimate r v 55 0.05 for the 3 classes of object. If more 
than one class is present among the 77 neighbors, i.e. if the 
object falls in a locus containing a mix of various classes, 
the class is not robustly defined and the average redshift 
is given for all classes inside r v . Our mock catalog is such 
that for 98% of the galaxies, the redshift accuracy is not 
strongly dependent on 77 for 77 < 5, and for 90% of the 
QSOs, 77 = 1, i.e., the nearest neighbor always provides 
a more accurate redshift than the average over a group 
of objects (the remaining 2% galaxies and 10% QSOs are 
affected by degeneracies in redshifts, which are described 
in Sect. 6.1). 

4. Star/galaxy/QSO separation 

In this section we examine quantitatively how the separa- 
tion of stars, galaxies and QSOs can be performed using 
the PCA, and how the object profile can contribute to the 
analysis. 




m 




y2 



Fig. 15. PCA projections yi, 1/2, and j/3 for a mock LZ 
catalog of 33,486 objects (3370 stars [+], 29,986 galaxi 
[•] and 130 QSOs [squares]). The SEDs have a medi; 
S/N ~ 100. 



4.1. Extended versus unresolved objects 

In an Einstein-de Sitter universe, the angular size 9 of an 
object with physical size D is 

9{z) = (M„/2c)(l + zf[{\ + z)-(l + z) 1 ' 2 ]- 1 (18) 

where c = 300, 000 km s™ 1 , and Hq i s the Hubble constant . 
Following Sandage, Kron & Longair Sandagc ct al. (1995| ) , 
a typical disk galaxy has an effective diameter of D = 10 
kpc, which results in an angular size 9{z = 0.25) = 4.09"; 
this value assumes a mean surface brightness in Johnson 
V band [iy = 23.5 mag-arcsec -2 , and an exponential disk 
profile (see Table 6.2, p. 283, in Sandage, Kron & Longair, 
1995). For fiy — 25.1 mag-arcsec -2 , the angular size 9(z = 
0.5) = 2.8". Hence, most of bright galaxies at z 0.5 
are expected to be more extended than the PSF of the 
LZT (~ 1 — 2"), and potential confusion between star and 
galaxy SEDs is not expected to occur beyond redshift 0.2. 
At redshifts higher than 0.2, the PCA is indeed robust 
to segregate stars from galaxies due to the red-shifting of 
the galaxy SEDs. Only compact galaxies at low redshifts 
(z ^ 0.2) might be intrinsically difficult to distinguish 
from stars in the medium-band system of the LZT. So 
far there is no evidence for a dominating population of 
compact galaxies at these redshifts. Moreover, the fraction 
of galaxies in the predicted LZT redshift distribution of 
Fig. lying at z < 0.2 is ~ 10%, a non-negligeable but 
minor component of the full sample. 



Fig. 16. Same as in Fig. for a median S/N ~ 20. 



Table 6. Error in the classification of ~ 3000 stars, ~ 
3000 galaxies, and ~ 1000 QSOs for different values of 
median signal-to-noise ratio S/N (see text for details). 



Median S/N 


Input 


a 


Classification 


#" 




class 


# 


Stars 


Galaxies 


QSOs 




Stars 


3181 


3181 








100 


Galaxies 


2993 





2993 







QSOs 


984 








984 




Stars 


2843 


2833 


10 





20 


Galaxies 


2993 





2993 







QSOs 


984 








984 




Stars 


2092 


2092 








10 


Galaxies 


2993 





2993 







QSOs 


984 


6 





978 




Stars 


1956 


1923 





33 


6 


Galaxies 


2095 





2095 







QSOs 


741 


42 


9 


690 



a Input number of objects in each class 

Output number of stars, galaxies, and QSOs derived by the 
10-component PCA 



4.2. PCA classification efficiency 

One key parameter in determining the efficiency of the 
PCA classification is the signal-to-noise ratio in the spec- 
tra. Figure [l^ shows a 3D plot of the first 3 eigencom- 
ponents of the PCA performed on mock catalogues con- 
taining 33,486 objects (3370 stars, 29,986 galaxies and 130 
QSOs) and a median signal-to-noise ratio S/N ~ 100 (see 
Sect. 2.5). This catalog represents a sub-area of 1 deg 2 



of the future LZT survey, and the PCA method is that 
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Table 7. Error in the determination of the type for stars 
and galaxies, for different values of median signal-to-noise 
ratio S/N. Stars have a total of 131 types, and galaxies 
have 78 types (see text for details). 



Median S/N 


Object class 




type error # a 








±5 


±10 


±15 


> 15 


100 


Stars 


3177 


3 





1 




Galaxies 


2916 





4 


25 


20 


Stars 


2617 


163 


33 


30 




Galaxies 


2796 


14 


20 


133 


10 


Stars 


1658 


290 


73 


71 




Galaxies 


2509 


76 


32 


374 


6 


Stars 


1560 


317 


38 


41 




Galaxies 


1594 


85 


84 


332 



a Number of objects classified within ±5, ±10, ±15 and more 
than ±15 of their input types 



difference between the real type and the PCA-measured 
type, as in Table |[ For galaxies, an error of ±10 on 
the type leads to a catastrophic error in redshift in 1/3 
of the cases, and an error of ±15 or more leads sys- 
tematically to a catastrophic error in r edshift. Table @ 
shows the results for the 4 median signal-to-noise ratios 
S/N = 100,20,10,6. 

At a median S/N — 100, the PCA is able to discrimi- 
nate 0,B,A,F,G,K,M star types and most luminosity and 
metallicity classes. At a median S/N = 20, metallicity 
differences within each stellar type are no longer discrim- 
inated, and the luminosity class I, II, III, IV, V is often 
mismatched. At a median S/N = 10 and lower, only ma- 
jor continuum differences allow one to discriminate be- 
tween different stellar types. Table shows that even at 
S/N — 6, the PCA is able to discriminate types for half 
of the objects within each class. 



described in Sect. 3.2, based on the first 10 eigencompo- 
nents. In Fig. E^, the mock catalog of Fig. [l5| is degraded 
to a median signal-to-noise ratio S/N ~ 20. In both Figs., 
stars, galaxies and QSOs occupy different loci in the 3D- 
space, therefore allowing us to separate objectively the 3 
classes of objects, and illustrating how the PCA is robust 
to noise. 

Table ^ compiles the efficiency of the classification as 
a function of signal-to-noise ratio for a small mock catalog 
described in Sect. 5 below. Following the previous Sect., 
we assume that the 1347 galaxies at z 0.5 are extended, 
and thus can be separated from stars using morphologi- 
cal criteria. The remaining 1646 galaxies at z <; 0.5 are 
separated from stars and QSOs using Eq. ||. For a me- 
dian S/N = 100, the object classification is perfect, and 
although it slowly degrades as the signal-to-noise ratio de- 
creases, the performance of the classification are weakly 
dependent on the signal-to-noise ratio. 

Table ^| shows that QSO SEDs tend to be misclassified 
as stellar SEDs (6% of QSOs at a median signal-to-noise 
ratio of 6). There is no degeneracy between blue stars 
(O, B or white dwarfs) and low-redshift QSOs, because 
blue stars have no emission lines; QSOs are thus clearly 
segregated. In the range 2.5 < z < 3, QSO and F/G 
stellar SEDs are similar. For z <; 3.5, QSO signatures are 
unique again and misclassification is rare. Unfortunately, 
the range 2.5 < z < 3 corresponds to the peak of the 



expected QSO distribution (see Sect. 2.4). This explains 
why such a large fraction of QSOs are misclassified as 
stars for median S/N = 6. This outlines a weakness of 
this classification method, the solution of which probably 
lies in a slightly different approach which we discuss in 
Sect. H 

4.3. PCA type identifications 

Table |t] allows us to evaluate the accuracies in the type 
identifications for the 2 classes of objects having types 
(stars and galaxies). The accuracies are measured by the 



5. Redshift measurement accuracy 

In order to test the accuracy of the measurement of red- 
shifts with the PCA, we must make a prior hypothesis that 
the mock catalog is a fair representation of real observa- 
tions. In that case, it is acceptable to compare sub-samples 
of the main catalog to itself, for different S/N ratio to ver- 
ify that we can recover physical information (here, the red- 
shift). In other words, internal errors will be a good repre- 
sentation of the errors. On the other hand, if the mock cat- 
alog is not a fair simulation of the observations, additional 
systematic errors will degrade the measured accuracy. The 
underlying motivation for using sub-samples of the mock 
catalog (denoted "sub-mock" catalogues hereafter) is ob- 
viously to save computing time. Keeping this remark in 
mind, we generate small mock catalogues of galaxy SEDs 
according to the procedure described in Sect, g- The small 
mock catalogues contain ~ 3000 stars, ~ 3000 galaxies 
and ~ 1000 QSOs. We then project the SEDs onto the first 
10 eigenspectra derived from PCAs made on mock cata- 
logues with realistic proportions of stars, galaxies QSOs 
(~ 3000 stars, - 30 000 galaxies, ~ 1000 QSOs), and with 
a median signal-to-noise ratio S/N — 100. The resulting 
10 eigencomponents of each SED are compared to the cor- 
responding eigencomponcnts of the realistic mock catalog 
using the least-square technique described in Sect. || (Eqs. 
[TBI and 0). 

Figure [l7] (frames labeled "ALL" ) shows the residu- 



als in the redshift measurement and Tables |8j to 11 show 
the standard deviations in the residuals of measured red- 
shifts versus input redshifts, as a function of redshift using 
all galaxy templates available (54 PEGASE templates, 24 
GISSEL templates). In order to evaluate the impact on the 
error budget of using different templates for the galaxy 
SEDs, we restrict the sub-mock catalogues, whereas the 
eigenbasis on which the various sub-samples are projected 
remains unchanged and contains all 54 PEGASE tem- 
plates and 24 GISSEL templates. For each median signal- 
to-noise value, we generate sub-mock catalogues using for 
galaxies: only the 24 GISSEL templates (Tables || to [H], 
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Fig. 17. Redshift residuals versus input redshift extracted from a x 2 reduction of the first 10 PCA eigencomponents. 
In each frame, spectra in sub-mock catalogues are projected onto realistic mock simulations with a median signal- 
to-noise ratio of S/N = 100. ^From top to bottom, in addition to ~ 3000 stars and ~ 1000 QSOs, the sub-mock 
calatogues include ~ 3000 galaxy SEDs drawn respectively from only the 78 galaxy templates of PEGASE+GISSEL 
(ALL), from only the 24 galaxy templates of GISSEL, from only the 54 PEGASE templates (PEGASE), and from only 
the 31 PEGASE Sal3 templates (Sal3). In all frames, only QSOs have redshifts z > 2. The corresponding standard 
deviations are listed in Tables | to [ll]. 



and frames labeled "GISSEL" in Fig. |T^); only the 54 
PEGASE templates (Tables | to [fl], and frames labeled 
"PEGASE" in Fig |T?|); only the 31 Sal3 PEGASE tem- 
plates (Tables § to [lip Cols, labeled "PEGASE Sal3", and 
frames labeled Sal3 in Fig. [l7]). In Fig. |l7|, the simulations 
are shown for median signal-to-noise ratios S/N — 100, 
S/N = 20, S/N = 10, and S/N = 6. 



All groups of templates shown in Fig. 17 and Tables 
U to |ll] show similar trends. They yield accurate redshift 
measurements at all redshifts for median S/N = 100, 20, 
with ctros. ^ 0.02 for galaxies at 0.1 < z < 1.2 or 1.8 < 



z < 2.0 and all QSOs (at z > 2.0), and er Rcs . ~ 0.02 - 0.04 
for galaxies in 1.3 < z < 1.7. For a median S/N = 10, 
the errors in the redshift measurement remain very small 
with ctrcs. ^ 0.04 for galaxies at 0.1 < z < 0.9, erR CS . — 
0.03-0.1 for QSOs (at z > 2.0), and cr Res , ~ 0.03-0.2 for 
galaxies in 1.0 < z < 2.0. Even for the lowest signal-to- 
noise ratio of S/N = 6, the redshift measurement is robust 
to z < 0.7 for galaxies, with ctr cs . ~ 0.03 — 0.1, and then 
suffers from catastrophic degeneracies at z > 0.9, where 
fRes. grows to ~ 0.2 — 0.3; similar problems affect QSOs, 
which have c7r os . ~ 0.05 — 0.2 for 2 < z < 3.0. 
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Table 8. Residuals in the redshift measurements for sub-mock simulations including ~ 3000 stars, ~ 1000 QSOs, and 
~ 3000 galaxy SEDs with a median signal-to-noise ratio of 100 (see also Fig. |l7| ). 
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Notes: 

at z < 2, z is the redshift of the galaxies binned by intervals of 0.1 
at z > 2, z is the redshift of the QSOs binned by intervals of 0.5 
# is the number of galaxies or QSOs in the redshift bin considered 
z is the average residual in the redshift 

fRes. is the r.m.s. dispersion in the redshifts errors defined as Az = |zpca — Zmput 
ALL corresponds to sub-mock catalogues with galaxies drawn from all galaxy templates 
GISSEL corresponds to sub-mock catalogues with galaxies drawn from GISSEL templates only 
PEGASE corresponds to sub-mock catalogues with galaxies drawn from PEGASE templates only 
PEGASE Sal3 corresponds to sub-mock catalogues with galaxies drawn from PEGASE Sal3 templates only 



Note that the dispersion in the redshift residuals for 
median S/N = 10 a nd S/N = 6 are comp arable with the 
results obtained by Hickson et al. (1994 ) using a \ 2 ad- 
justment of mock LZT galaxy SEDs ont o the PEGASE 
templates; in contrast to the results of Hickson et al 
(1994j]] the dispersion in the PCA redshift residuals in- 
creases monotonically with redshift. Tables || to [y] also 
show that the intense emission lines present in the QSOs 
allow us to measure redshifts at all signal-to-noise ratios. 
The problem of possible misclassification of QSO as stars, 
mentioned in Sect. 4.2, only affects the completeness of the 
sample, but does not affect the quality of redshift measure- 
ments. 

The similar redshift residuals for the 4 types of sub- 
mock catalogues (ALL, GISSEL, PEGASE, PEGASE 
Sal3 in Table |, |, [fi] and |ll|) may appear surprising, 
as one would expect more degeneracy when simulations 
include independent models of template SEDs. However, 
we project the sub-mock catalogues onto the full realistic 



catalog, which includes all template SEDs. An interesting 
test is to measure, for instance, redshifts from a GISSEL 
simulation using a PCA calculated from a PEGASE-based 
mock catalog, and to examine whether significant errors 
occur. Table |i"2| shows the systematic error thus introduced 
in the measurement of redshifts for signal-to-noise ratios 
S/N = 100, 20, 10, and 6; Fig. || shows the measured 
("observed") redshift versus the true redshift ("theoreti- 
cal") for S/N = 20. The standard deviation <7r C s. in the 
residuals rises by an order of magnitude for S/N = 100 
compared to the results in Table ||, and the average devi- 
ations show a systematic negative offset. For lower signal- 
to-noise ratios (S/N = 20,10,6), the degradation in the 
redshift measurement is less marked as the S/N decreases, 
as expected; the systematic negative offset persists and is 
largest for S/N — 6. Table [l^ shows the classification ef- 
ficiency for this simulation: a significant 6.6% of galaxies 
are misidentified as stars at a median S/N = 100 and 3% 
at a median S/N — 20. The misidentification occurs be- 



Cabanac, de Lapparent, Hickson: Classification and redshift estimation by PCA 
Table 9. Same as in Tabic || for a median signal-to-noise ratio of 20 (see also Fig. |l7| ). 
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Median S/N = 20 
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Table 10. Same as in Table @ for a median signal-to-noise ratio of 10 (see also Fig. |l7|) . 
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Table 11. Same as in Table || for a median signal-to-noise ratio of 6 (see also Fig. |l7| ). 
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Table 12. Residuals in the galaxy redshift measurements binned by increasing redshift for the simulations including 
GISSEL templates only on a PCA including PEGASE templates only, for various median signal-to-noise ratios (see 
also Fig. pi). 
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Definition of Cols.: 

number of galaxies or QSOs in the redshift bin considered 
z: average residual in the redshift 
a R es .: standard deviation in residual 



cause the PEGASE templates do not fill the 10-D PCA 
space in the same way as GISSEL templates do; the effect 
is smaller for lower S/N because noise partly dissipates 
the difference. 



This test therefore shows that classification and red- 
shift measurement are sensitive to the representativity of 
each analyzed spectrum among the sample from which the 
PCA is performed: a subsample of observed SEDs not rep- 
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0.5 1 1.5 2 

Observed redshift 



Table 13. Star/Galaxy/QSO classification efficiency for 
a GISSEL-based mock catalog projected onto a PEGASE- 
based PCA (see text for details). 



Fig. 18. Measured redshift versus input redshift using a 
X 2 reduction of the first 10 PCA eigencomponents of a 
GISSEL-based mock catalog projected onto a PEGASE- 
based PCA. The median signal-to-noise ratio is S/N = 20. 
The corresponding standard deviations for z < 1.7 are 
listed in Table HI 



resented in the catalog used for the PCA might have sys- 
tematically biased redshifts and classification. This hints 
to the inaccuracies of the current spectral synthesis mod- 
els rather than to a real degeneracy problem for PCA, and 
urges the use of the widest variety of galaxy/QSO/stellar 
SEDs when performing a PCA. Instead of performing 
the PCA onto the observed sample, a better method is 
to calculate the eigenvectors from a mock catalog with 
the widest variety of galaxy, QSO, and stellar SEDs, to- 
gether with a realistic mix of the different object types 
and realistic redshift distributions. This was attempted 
with the mock catalogues generated here. However, the 
systematic differences between the GISSEL and PEGASE 
templates show that improved models of spectral libraries 
are needed. 



6. Discussion 

6.1. Degeneracy in PCA 

We now examine the problem of degeneracies in the PCA, 
which is particularly acute when too few eigencomponents 
are used. Figures [n| and ^l] illustrate the limit of re- 
constructing SEDs using only the first 4 eigencomponents 
of the PCA. Figures |l| and |(] show LZT SEDs of galax- 
ies which have different redshifts but similar continua. In 
principles, the spectra should be clearly distinguishable 
using their local features. But the 4-component PCA clas- 
sifies them as "similar" spectra, hence introducing a large 
error in the redshift measurement. Using Eq. |l7], the PCA 
measures z — 0.354 for the pair of SEDs in Fig. [l9|, which 
have true redshifts z = 0.354, z = 0.274, and z = 0.204 
for the pair in Fig. J2fj, with true redshifts z = 0.204, 
z = 0.064. Figure Eh shows the PCA reconstruction of the 
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Fig. 19. SEDs of 2 GISSEL SAB galaxies viewed through 
the LZT filter system (see Fig. |), with high S/N and 
slightly different IMF metallicity and redshifts: Salpeter 
IMF, metallicity [Fe/H]= 0.004, and z = 0.354 for the 
SED shown as dotted lines; and Salpeter IMF, metallicity 
[Fe/H]= 0.020 and z = 0.274 for the SED shown as dashed 
line. The 2 SEDs are clearly distinguishable. 



SEDs using the first 4 eigencomponents and their corre- 
sponding eigenvectors. The reconstructed SEDs are nearly 
identical in each pair. The 4-component PCA washed out 
all local features precluding any finer analysis. We also 
estimated the redshift measurement error using only the 
4-component PCA. The redshift accuracies at a median 
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Fig. 20. Same as in Fig. [L9] using the PEGASE library: 
an old Sa galaxy at z = 0.204 (solid line) and an Elliptical 
galaxy at z = 0.064 (dot-dashed line). 




A (A) 



Fig. 21. The reconstructed SEDs using the first 4 com- 
ponents of the PCA are shown for the 4 galaxies of Fig. 
[L9 and 20 using the same line types. Differentiating be- 



tween the 2 SAB galaxies, resp. between the Sa and the 
E galaxy, is intrinsically difficult. 



S/N — 10 were 3 times larger than those obtained with 
the 10-componcnt PCA at the same S/N ratio (see Table 

The problem of degeneracies when trying to measure 
redshifts using the PCA is linked to an intrinsic limit of 
the technique. The PCA is powerful for filtering the noise 
in observed spectra, for extracting both the common fea- 
tures and the major source of variance among an ensem- 
ble of SEDs. In the case of galaxy SED's at zero redshift, 
the PCA is well suited for accurate classification, as the 
common features of the spectra are the underlying stellar 
populations and the characteristic emission and absorp- 
tion lines associated with them. In the present case, the 
mix of redshifts which cover a large interval (0 < z < 2 
for galaxies) introduces a vast variety of continuum shapes 
among the SEDs and the dominant eigencomponcnts of 
the resulting PCA emphasize the continuum over local 
features, which are common to only a few objects (those 
at the same redshift). 



Fig. 22. SEDs of 2 PEGASE galaxies viewed through the 
LZT filter system (see Fig. |l|), with intermediate S/N, 
and different ages and redshifts: E with age 1.8 Gy, and 
z = 0.800 (solid line), and Sa with age 12 Gy and z — 
0.688 (dot-dashed line); both have similar metallicities and 
IMF of Kroupa |Fioc k Rocca-Volmerange (1997| ). The 
2 SEDs are clearly distinguishable as well as the PCA 
reconstructions in Fig. |2^. 



A (A) 



Fig. 23. Reconstruction of the templates shown in Fig. |22j 
using 10 eigencomponents of a PCA mock catalog. High 
signal-to-noise ratio spectra projected onto such a PCA 
would allow us to differentiate the 2 reconstructions, but 
low signal-to-noise SEDs would be indistinguishable. 



Most of the degeneracies are resolved when using 10 
eigencomponents instead of 4, because 10 components al- 
low one to reproduce the main absorption bands and dis- 
continuities through the entire range of redshifts. Figures 
p2| and ^3] illustrate the description improvement of a 10- 
eigencomponents analysis. Figure [2^ shows a PEGASE E 
template of 1.8 Gy at a redshift z = 0.8 (solid line) su- 
perposed on a PEGASE Sa spiral template of 12 Gy at a 
redshift z — 0.688 (dot-dashed line). Figure |25| shows the 
PCA reconstruction of the same templates using 10 eigen- 
components. High signal-to-noise ratio spectra would be 
distinguishable whereas low signal-to-noise spectra might 
be misclassificd. 
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Fig. 24. Comparison of the input redshifts z mo del of a 
simulated catalog and the photometric redshift estima- 
tion z J hot from HYPERZ code (|Bolzonella et al. 2000|) 
when using the LZT 40 medium-band filters. The redshift 
sequence is tight at all redshifts showing only a small de- 
generacy near redshift 2. 



Fig. 25. Same as in Fig. using BVRI broad-band fil- 



ters. This figure is reproduced from Fig. 2 of (Bolzonella 



et al. 2000); dotted lines correspond to Az — 0.2, dashed 
lines to Az = 0.5, and thin solid lines to Az = 1, where 
Deltaz = |z mo dei — -Zphotl- Conspicuous degeneracies ap- 
pear along the redshift sequence. 



Such degeneracies have however a weak impact on the 
overall performances of the PCA technique as indicated 
in Tables ^ to [n]. Catastrophic degeneracies in redshift 
affect only 2% of the galaxies, and 10% of the QSOs at all 
signal-to-noise ratios considered; these fractions are cal- 
culated as the fractions of objects in each class for which 
the difference between the input and measured redshift is 
larger than the quadrature sum of the redshift residuals 
corresponding to the 2 redshift values (listed in Tables 
[n]). As the de generacy sometimes only affect the redshift 
and not the spectral type (see Fig. |l9|), the degeneracies 
in redshift yield degeneracies in spectral type for only a 
sub-fraction of the objects. 

Note that if the catalog used to calculate the PCA is 
different from the observed catalog for which the classifi- 
cation, type and redshift measurement are planned to be 
performed, a pre-requisite is to introduce in the source 
catalog for the PCA similar error functions as in the data. 
Projecting an observed catalog with high signal-to-noise 
ratio, onto a PCA calculated from low signal-to-noise spec- 
tra would merely result in the loss of information present 
in the data. Whereas projecting an observed catalog with 
low signal-to-noise onto a PCA derived from high signal- 
to-noise data would result in catastrophic errors. 

6.2. Comparison with photometric redshifts 

The galaxy redshift accuracy <7R es . ~ 0.01 — 0.05 (for 
median S/N > 10) obtained here with the PCA should 
not be attributed to the PCA technique itself, but to the 
dense sampling of the SEDs by the LZT medium-band 
filters (note that <Jn es . ~ 1/R with the LZT filter reso- 
lution R ~ 40, see Fig. [l]). We illustrate this point by 
using another method to measure the redshifts of "LZT- 
observed" galaxies. We use the published code HYPERZ 
( [Bolzonella ~ct al. 200C| ) for measuring "photometric red- 



shifts" by standard % 2 fits to a spectral library. The model 
catalog of S EDs was kindly provid ed by R. Pello and is 
described in Bolzonella et al. (2000 ): it contains the SEDs 
with a S/N — 10 of 1000 galaxies generated using ver- 
sion GISSEL98 of the GISSEL spectral library (Bruzual 
and Chariot, 1993; see Sect. 2.3.2). This sample was then 
"observed" through the LZT filters. Figure 24 shows a 
comparison between the input redshifts z mo d i of the re- 
sulting catalog with their photometric estimation z p hot- 
In the redshift range of interest for the LZT galaxies 
(0 < z < 1.5), the r.m.s. dispersion in the redshifts errors 
Az = |zphot — -z mo doi| (after rejection of the few "catas- 
trophic detections", defined by Az > 1, and occurring 
only at z < 0.5), is 0.036 in the interval < z < 0.5, 
0.018 in the interval 0.5 < z < 1.0, 0.033 in the interval 
1.0 < z < 1.5. Note that these dispersions are comparable 
with the values obtained with the PCA in the full redshift 
range < z < 1.5 (see Table [[(]). 

For comparison, we replicate in Fig. |25| , the BVRI 
panel of Fig. 2 from Bolzonella ct al. (2000 ), showing the 
dispersion in the photometric redshifts calculated from 
broad-band magnitudes with an uncertainty Am = 0.1 
(which approximately matches the S/N = 10 in the LZT 
SEDs used in Fig. |i|): in the redshift range < z < 1.5, 
the r.m.s. uncertainty in z P hot is in the range 0.2 — 0.3 (see 
Table 2 of Bolzonella et al. 200C ), 10 times larger than with 
the LZT filters. The variance in the redshifts error is also 
calculated after rejection of the "catastrophic detections" 
with Az > 1. These correspond to analogous degenera- 
cies as those described here in Sect. 6.1, and mainly occur 
in the redshift intervals < z < 0.4 et 1.5 < z < 4.5. 
We emphasize that the PCA-measured redshift sequence 
in Fig. |24| is much cleaner than that obtained with the 
broad-band filters in Fig. |2^. As both sets of filters span 
the same wavelength range, this comparison provides a 
direct demonstration of the general advantage of using 
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the LZT medium-band niters over the traditional use of 
broad-band filters, in proportion to the resolution of the 
filter system used, and at the expense of observing time. 
An intermediate approach in which a wide-band system is 
complemented by medium-band filters was also proposed 



by Budavari et al. (2001), with a moderate gain in redshift 
accuracy (the error is decreased by 34% for z < 1.3). 



6.3. Improvements 

The measurements of physical parameters using LZT 
SEDs might be improved by complementing the PCA 
with a \ 2 technique for measuring the redshifts. Based 
on spectral libraries such as those used here, a % 2 anal- 
ysis measures redshifts to ctr cs . Ss 0.02 at z Si 1 for 
S/N — 10 (Hickson et al. 1994). It is however very sensi- 



tive to the noise in the SEDs, as it makes use of all the 
information carried in them. This approach also provides 
a way to perform the object classification (note that at 
redshift z Sa 0.5, comparison of the object profiles with 
the point-sprcad-function could be used f or di fferentiating 
stars from galaxies, as discussed in Sect. 4T). One could 



use the x technique to control the redshifts calculated 
by the 10-component PCA, and thus identify the degen- 
erate SEDs for which better redshifts could be measured. 
Alternatively, one could consider a 2-step analysis in which 
the x 2 technique is first applied to the observed SEDs in 
order to measure their redshifts, and in a second step, 
the PCA is applied to the SEDs "blue-shifted" to z = 0, 
with the goal to obtain a reliable spectral classification for 
the identified objects, and to generate "noiseless" recon- 
structed SEDs. An iterative process using both techniques 
might be required and would contribute to improving the 
object classification and redshift measurement. 

Because observing objects at widely different redshifts 
results in rest-wavelength SEDs covering differing wave- 
length intervals. The basic PCA technique described above 
and requiring that objects are covered by a common wave- 
length interval, could not be used on the SEDs "blue- 
shifted" to z = 0. One could consider performing sepa- 
rate PCAs on sub-samples of rest-wavelength SEDs se- 
lected by redshift interval. Studying the evolution of the 
galaxy populations with redshift would however require 
to compare the SEDs at different redshift over a common 
wavelength interval. A technique, recently developed by 
Connolly and co-workers, uses the reconstructing ability of 
the PCA to fill the gaps in data with varying wavelengths 
intervals (Connolly & Szalay 1999). A complementary ap- 



proach which creates the spectral templates from multi- 



color surveys (Budavari et al. 2000) could also be applied. 
Overall, the resulting PCA would allow both a reliable 
spectral classification of all objects, and reconstruction of 
the missing regions of the spectra under the null-evolution 
hypothesis (which the PCA would implicitly do). By com- 
parison of how the different spectral types are populated 
at different redshifts, such a PCA would allow one to ex- 



amine and quantify the redshift evolution of the galaxy 
population. 

7. Conclusion 

This paper describes an application of Principal 
Component Analysis (PCA) to a simulated multicolor 
survey using the 40 medium-band filters of the Large 
Zenith Telescope. For that purpose, we generate realis- 
tic mock catalogues of ~ 3000 stars, ~ 30 000 galaxies, 
and ~ 1 000 QSOs. For stars, we use templates from the 
library of Pickles (1998) and the phenomenological model 
of star counts of Bahcall & Soneira (1986). For galaxies, 
we use spectral energy distributions (SED) from GISSEL 
(Chariot, 1993) and PEGASE (Fioc, 1997) and a luminos- 
ity function derived from a review of the most recent R- 
band luminosity functions of the literature. We choose the 
CFRS-type evolution in the galaxy luminosity function, 
with luminosity evolution of only the late-type galaxies. 
For QSOs, we use an extrapolation of the composite spec- 
trum of Francis (1991), and the luminosity function of the 
2dF QSO survey. 

Using the realistic mock catalogues, we perform a PCA 
and extract the first 10 eigencomponents. The 10-D space 
allows one to separate efficiently stars, galaxies, and QSOs 
even at low signal-to-noise ratios. 98% of stars, 100% of 
galaxies and 93% of QSOs are classified correctly at a me- 
dian signal-to-noise ratio S/N = 6. These values increase 
to 100% of stars, 100% of galaxies and 100% of QSOs at a 
median S/N = 10. For SEDs with a median S/N = 6, the 
10-component PCA also provides a measurement of red- 
shifts accurate to <7r cs . Ss 0.05 for galaxies with z Ss 0.7, 
and to ctros. S; 0.2 for QSOs with z > 2. At a median 
S/N = 20, ctrcs. £5 0.02 for galaxies with z ^ 1 and for 
QSOs with z *b 2.5 (for a given median S/N, a 10 to 30 
times lower S/N is expected at the extreme wavelengths of 
the bluest/reddest objects). This is not sufficient for small- 
scale 3-D clustering analyses, but perfectly adequate for 
luminosity function studies, and for measuring the evolu- 
tion with redshift in the large-scale clustering using pro- 
jected moments. 

This paper also underlines the main weakness of the 
PCA. It is well-known that age, star-formation rate, red- 
shift, and dust extinction produce degenerate SEDs at a 
resolution R ~ 40. Although the PCA is not able to re- 
solve some intrinsic degeneracies due to the medium-band 
observing technique, it efficiently reduces the noise in the 
SEDs at the expense of additional degeneracies. The so- 
lution to this problem may lie in the combination of a 
PCA with a standard x 2 fitting procedure. Another cru- 
cial issue in the use of the PCA for type/class/redshift 
measurement, is to calculate the eigenvectors from a sam- 
ple in which each type of object is sufficiently well rep- 
resented. The use of such a catalog, constructed from a 
combination of a wide variety of well calibrated observed 
SEDs together with precise evolutionary models, will guar- 
antee the best results for PCA analyses. These reference 
samples will also allow detection of new types of object, 
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as these will significantly deviate from the sequences of 
known objects. 
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